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Abstract 

All modern approaches to molecular phylogenetics require a quantitative model for how genes evolve. Unfortunately, 
existing evolutionary models do not realistically represent the site-heterogeneous selection that governs actual sequence 
change. Attempts to remedy this problem have involved augmenting these models with a burgeoning number of free 
parameters. Here, I demonstrate an alternative: Experimental determination of a parameter-free evolutionary model via 
mutagenesis, functional selection, and deep sequencing. Using this strategy, I create an evolutionary model for influenza 
nucleoprotein that describes the gene phytogeny far better than existing models with dozens or even hundreds of free 
parameters. Emerging high-throughput experimental strategies such as the one employed here provide fundamentally 
new information that has the potential to transform the sensitivity of phylogenetic and genetic analyses. 
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Introduction 

The phylogenetic analysis of gene sequences is one of the 
most important and widely used computational techniques 
in all of biology. All modern phylogenetic algorithms require a 
quantitative evolutionary model that specifies the rate at 
which each site substitutes from one identity to another. 
These evolutionary models can be used to calculate the sta- 
tistical likelihood of the sequences given a particular phylo- 
genetic tree (Felsenstein 1973). Phylogenetic relationships are 
typically inferred by finding the tree that maximizes this like- 
lihood (Felsenstein 1981) or by combining the likelihood with 
a prior to compute posterior probabilities of possible trees 
(Huelsenbeck et al. 2001). 

Actual sequence evolution is governed by the rates at 
which mutations arise and the selection that subsequently 
acts upon them (Halpem and Bruno 1998; Thorne et al. 2007). 
Unfortunately, neither of these aspects of the evolutionary 
process are traditionally known a priori. The standard 
approach in molecular phylogenetics is therefore to assume 
that sites evolve independently and identically, and then con- 
struct an evolutionary model that contains free parameters 
designed to represent features of mutation and selection 
(Goldman and Yang 1994; Yang 1994; Yang et al. 2000; 
Kosiol et al. 2007). This approach suffers from two major 
problems. First, although adding parameters enhances a 
model's fit to data, the parameter values must be esti- 
mated from the same sequences that are being analyzed 
phylogenetically — and so complex models can overfit the 
data (Posada and Buckley 2004). Second, even complex 
models do not contain enough parameters to realistically 
represent selection, which is highly idiosyncratic to specific 
sites within a protein. Attempts to predict site-specific selec- 
tion from protein structure have had limited success 
(Rodrigue et al. 2009; Kleinman et al. 2010), probably because 



even sophisticated computer programs cannot reliably pre- 
dict the impact of mutations (Potapov et al. 2009). 

Methods have been developed to infer site-specific selec- 
tion from naturally occurring sequences (Rodrigue et al. 2010; 
Tamuri et al. 2012, 2014). Because the number of possible 
mutations is large, steps must be taken to ensure that these 
methods do not overfit the data (Rodrigue 2013). However, 
even when such steps are taken, the inferred site-specific se- 
lection parameters cannot easily be applied to phylogenetic 
analyses. The reason is that the selection parameters are gen- 
erally inferred from the same naturally occurring sequences 
that are of phylogenetic interest — and parameters inferred 
from a data set cannot be used to analyze that same data 
set without additional procedures to avoid overfitting. The 
procedures that have been devised to restrain this problem of 
proliferating free parameters are complex and generally re- 
quire assuming that sites fall into only a limited number of 
different classes (Lartillot and Philippe 2004; Le et al. 2008; 
Wang et al. 2008; Wu et al. 2013). Therefore, estimating site- 
specific selection from natural sequences is an imperfect 
method for inferring realistic evolutionary models for phylo- 
genetic analyses. 

Here, I demonstrate a radically different approach for 
constructing quantitative evolutionary models: Direct exper- 
imental measurement. This approach bypasses the aforemen- 
tioned problem of proliferating free parameters because 
site-specific selection is measured experimentally without 
consideration of naturally occurring sequences. The evolu- 
tionary models constructed from these experiments therefore 
do not contain any parameters that must be estimated 
from the natural sequences that are being analyzed 
phylogenetically. 

Specifically, using influenza nucleoprotein (NP) as an 
example, I experimentally estimate mutation rates via 
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limiting-dilution passage and site-specific selection via deep 
mutational scanning (Fowler et al. 2010; Araya and Fowler 
2011), a combination of high-throughput mutagenesis, func- 
tional selection, and deep sequencing. I then show that these 
experimental measurements can be used to create a param- 
eter-free evolutionary model describes the NP gene phylogeny 
far better than existing models with numerous free parame- 
ters. Finally, I discuss how the increasing availability of data 
from high-throughput experimental strategies such as the 
one employed here has the potential to transform analyses 
of genetic data by augmenting generic statistical models of 
evolution with detailed molecular level information. 

Results 

Components of an Experimentally Determined 
Evolutionary Model 

A phylogenetic evolutionary model specifies the rate at which 
one genotype is replaced by another. These rates of genotype 
substitution are determined by the underlying rates at which 
new mutations arise and the subsequent selection that acts 
upon them (Halpern and Bruno 1998; Thome et al. 2007). 
A standard assumption in molecular phylogenetics is that the 
rate of genotype substitution can be decomposed into inde- 
pendent substitution rates at individual sites. Here, I make this 
assumption at the level of codon sites, and use P rAy to denote 
the rate that site r substitutes from codon x to y given that 
the identity is already x. I further assume that it is possible to 
decompose P rj(y as 

^r,xy — Qxy X f>,xy, (1) 

where Qxy is the rate of mutation from x to y (assumed to be 
constant across sites) and F rAy is the site-specific probability 
that a mutation from x to y will fix at site r if it arises. Both are 
assumed to be constant over time. 

Given the evolutionary model described by equation (1), 
the challenge is to experimentally estimate the mutation 
rates and the fixation probabilities F rjXy . In the following 
sections, I describe these experiments. 

Measurement of Mutation Rates 
A general challenge in quantifying mutation rates is the dif- 
ficulty of separating mutations from the subsequent selection 
that acts upon them. To decouple mutation from selection, I 
utilized a previously described method for growing influenza 
viruses that package green fluorescent protein (GFP) in the 
PB1 segment (Bloom et al. 2010). The GFP does not contrib- 
ute to viral growth and so is not under functional selection — 
therefore, substitutions in this gene accumulate at the muta- 
tion rate. 

To drive the rapid accumulation of substitutions in 
the GFP gene, I performed limiting-dilution mutation- 
accumulation experiments (Halligan and Keightley 2009). 
Specifically, I passaged 24 replicate populations of GFP- 
carrying influenza viruses by limiting dilution in tissue culture, 
at each passage serially diluting the virus to the lowest con- 
centration capable of infecting target cells. Because each lim- 
iting dilution bottlenecks the population to one or a few 



Table 1. Mutations Identified by Sequencing the 720-Nucleotide GFP 
Gene Packaged in the PB1 Segment After 25 Limiting-Dilution 
Passages for 24 Independent Replicates. 



Clone Mutations 



Clone 1 


C62T (G21V), T693C (synonymous), del153-522 (indel) 


Clone 2 


None 


Clone 3 


C29T (T10I) 


Clone 4 


None 


Clone 5 


None 


Clone 6 


C429A (synonymous), C447T (synonymous) 


Clone 7 


None 


Clone 8 


None 


Clone 9 


C646A (R216S) 


Clone 10 


C471T (K157N), C703A (D235N) 


Clone 11 


T111C (synonymous), T718C (*240E) 


Clone 12 


T25C (F9L), T26C (F9S) 


Clone 13 


C45T (synonymous), C549T (synonymous) 


Clone 14 


T319C (Y107H), C372T (synonymous), C539T (A180V) 


Clone 15 


A488C (K163T) 


Clone 16 


C274T (C92C) 


Clone 17 


None 


Clone 18 


None 


Clone 19 


None 


Clone 20 


C527A (S176N), A676C (T226A) 


Clone 21 


C4A (V21) 


Clone 22 


T266C (M89T) 


Clone 23 


None 


Clone 24 


C30T (synonymous), del45-590 (indel) 



Note. — The numbering is sequential beginning with the first nucleotide of the GFP 
start codon. For nonsynonymous mutations, the induced amino acid change is 
indicated in parentheses. 



infectious virions, mutations fix rapidly. After 25 rounds of 
passage, the GFP gene was Sanger sequenced for each repli- 
cate to identify 24 substitutions (tables 1 and 2), for an overall 
rate of 5.6 x 10 -5 mutations per nucleotide per tissue- 
culture generation — a value similar to that estimated previ- 
ously by others using a somewhat different experimental 
approach (Parvin et al. 1986). The rates of different types of 
mutations are in table 3 and possess expected features such as 
an elevation of transitions over transversions. 

Because an observed mutation of A— >• G can arise from 
either this change on the sequenced strand or a change of 
T —> C on the complementary strand, then assuming that the 
same molecular mutation process affects both strands, there 
are only the six different mutation rates shown in table 3. 
Specifically, let R m ^ n represent the rate at which nucleotide 
m mutates to n given that the identity is already m, and let m c 
denote the complement of m (e.g., A c is T). The assumption 
that the same molecular mutation process affects both 
strands means that R m ^„ = R mc ^„ c . An additional empirical 
observation from table 3 is that the mutation rates for 
influenza are approximately symmetric, with the rate of 
each mutation approximately equal to its reversal 
(R m ^ n + R mc ^„ c ^ R n ^m + Rfi c ^m c )- Because it somewhat 
simplifies computational aspects of the subsequent phyloge- 
netic analyses, I enforce this empirical observation of 
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Table 2. Counts for Different Types of Mutations After the 25 
Limiting-Dilution Passages. 



Table 3. Influenza Mutation Rates. 



Mutation Type 



Mutation Type 



Number of Occurrences 



Total substitutions 


24 


Transversions 


6 


Transitions 


18 


Nonsynonymous 


15 


Synonymous 


8 


Stop codons 


1 


Indels 


2 


G 


1 


C 


6 


T^ A 


0 


G — T 


3 




n 
u 


A 


4 


T 


7 


C^G 


0 


A 


1 


A^ T 


0 


A^ G 


1 


A^ C 


1 



Note. — The numbers are calculated from table 1. Given that GFP is 720 nucleotides 
long, the data suggest a viral mutation rate of 5.6 x 10~ 5 mutations per nucleotide 
per tissue-culture generation. 



approximately symmetric mutation rates to be exactly true 
by taking the rates of mutations and their reversals to be the 
average of the two. With the further assumption that codon 
mutations occur a single nucleotide at a time, the mutation 
rates Qx y from codon x to y are estimated from the experi- 
mental data in table 3 as 



o 

4.7 x 1(T 5 
1.84 x 1CT 5 
6.0 x 10" 6 

3.8 x 10" 6 



f x and y differ by more than one nucleotide, 
f x and y differ by A — > G,T — > C,G — > A, or C - 
f x and y differ by A — > C,T — > G,C — > A, or G - 
f x and y differ by A — > T or T — > A, 
f x and y differ by G — >■ C or C — > G. 



(2) 



These mutation rates define the first term in the evolutionary 
model specified by equation (1). 

Deep Mutational Scanning to Assess Effects of 
Mutations on NP 

Estimation of the fixation probabilities F rAy in equation (1) 
requires quantifying the effects of all «s 10 4 possible amino 
acid mutations to NP. Such large-scale assessments of muta- 
tional effects are feasible with the advent of deep mutational 
scanning, a recently developed experimental strategy of high- 
throughput mutagenesis, selection, and deep sequencing 
(Fowler et al. 2010; Araya and Fowler 2011) that has now 
been applied to several genes (Fowler et al. 2010; Traxlmayr 
et al. 2012; Melamed et al. 2013; Roscoe et al. 2013; Starita 
et al. 2013). Applying this experimental strategy to NP 
requires creating large libraries of random gene mutants, 
using these genes to generate pools of mutant influenza vi- 
ruses which are then passaged at low multiplicity of infection 
(MOI) to select for functional variants, and finally using 



A — > G, T — > C (transition) 
G — > A, C — > T (transition) 
A— > C, T^ G (transversion) 
C— > A, G— > T (transversion) 
A — > T, T — > A (transversion) 
G — > C, C — > G (transversion) 



Rate 

2.4 x 10 5 

2.3 x 10 5 
9.0 x 10 6 

9.4 x 10 6 
3.0 x 10 6 
1.9 x 10 6 



Note. — Numbers represent the probability a site that has the parent identity will 
mutate to the specified nucleotide in a single tissue-culture generation and are 
calculated from tables 1 and 2 after adding one pseudocount to each mutation 
type. Mutations are in pairs because an observed change of A — » G can derive either 
from this mutation on the sequenced strand or a C on the complementary 
strand, and so the paired mutations are indistinguishable assuming that the same 
mutational process applies to both strands of the replicated nucleic acid molecule. 
The numbers are the estimated rates of each individual mutation, so, for example, 
the observed rate of change from A— > G is 2 x (2.4 x 10~ 5 ) because this change 
can arise from either of the two mutations A— > G and T— » C 



lllumina sequencing to assess the frequency of each mutation 
in the input mutant genes and the resulting viruses. Because 
NP plays an essential role in influenza genome packaging, 
replication, and transcription (Portela and Digard 2002; Ye 
et al. 2006), mutations that interfere with NP function or 
stability will impair or ablate viral growth. Such mutations 
will therefore be depleted in the mutant viruses relative to 
the input mutant genes. 

Most previous applications of deep mutational scanning 
have examined single-nucleotide mutations to genes, because 
such mutations can easily be generated by error-prone poly- 
merase chain reaction (PCR) or other nucleotide-level muta- 
genesis techniques. However, many amino acid mutations are 
not accessible by single-nucleotide changes. I therefore used a 
PCR-based strategy to construct codon-mutant libraries that 
contained multinucleotide (i.e., GGC — > act ) and single- 
nucleotide (i.e., GGC — > AGC) mutations. The use of codon- 
mutant libraries has an added benefit during the subsequent 
analysis of the deep sequencing when trying to separate true 
mutations from errors, because the majority (54 of 63) pos- 
sible codon mutations involve multinucleotide changes, 
whereas sequencing and PCR errors generate almost exclu- 
sively single-nucleotide changes. I used identical experimental 
procedures to construct two codon-mutant libraries of NP 
from the wild-type (WT) human H3N2 strain A/Aichi/2/1968 
and two from a variant of this NP with a single amino acid 
substitution (N334H) that enhances protein stability 
(Ashenberg et al. 2013; Gong et al. 2013). These codon- 
mutant libraries are termed WT-1, WT-2, N334H-1, and 
N334H-2. Each of these four mutant libraries contained 
more than 10 s unique plasmid clones. Sanger sequencing of 
30 clones drawn roughly equally from the four libraries 
revealed that the number of codon mutations per clone fol- 
lowed a Poisson distribution with a mean of 2.7 (fig. 1). These 
codon mutations were distributed roughly uniformly along 
the gene sequence and showed no obvious biases toward 
specific mutations (fig. 1). Most of the a;10 4 unique amino 
acid mutations to NP therefore occur in numerous different 
clones in the four libraries, both individually and in combina- 
tion with other mutations. 
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Fig. 1. The codon-mutant libraries as assessed by Sanger sequencing 30 individual clones. (A) The clones have an average of 2.7 codon mutations and 
0.1 indels per full-length NP coding sequence, with the number of mutated codons per gene following an approximately a Poisson distribution. (B) The 
number of nucleotide changes per codon mutation is roughly as expected if each codon is randomly mutated to any of the other 63 codons, with a 
slight elevation in single-nucleotide mutations. (C) The mutant codons have a uniform base composition. (D) Mutations occur uniformly along the 
primary sequence. (£) In clones with multiple mutations, there is no tendency for mutations to cluster. Shown is the actual distribution of pairwise 
distances between mutations in all multiply mutated clones compared with the distribution generated by 1,000 simulations where mutations are placed 
randomly along the primary sequence of each multiple-mutant clone. The data and code for this figure are available at https://github.com/jbloom/ 
SangerMutantl_ibraryAnalysis/tree/v0.21 (last accessed May 31, 2014). 



To assess effects of the mutations on viral replication, the 
plasmid mutant libraries were used to create pools of mutant 
influenza viruses by reverse genetics (Hoffmann et al. 2000). 
The viruses were passaged twice in tissue culture at low MOI 
to enforce a linkage between genotype and phenotype. The 
NP gene was reverse transcribed and PCR amplified from viral 
RNA after each passage, and similar PCR amplicons were 
generated from the plasmid mutant libraries and a variety 
of controls designed to quantify errors associated with se- 
quencing, reverse transcription, and viral passage (fig. 2). 
The entire process outlined in figure 2 was performed in 
parallel but separately for each of the four mutant libraries 
(WT-1, WT-2, N334H-1, and N334H-2) in what will be termed 
one experimental replicate. This entire process of viral crea- 
tion, passaging, and sequencing was then repeated indepen- 
dently for all four libraries in a second experimental replicate. 
The two independent replicates will be termed replicate A 
and replicate B. 

The mutation frequencies in all samples were quantified by 
lllumina sequencing, using overlapping paired-end reads to 



reduce errors (supplementary fig. SI, Supplementary Material 
online). Each sample produced k. 10 7 paired reads that could 
be aligned to NP, providing an average of « 5 x 10 5 calls per 
codon (supplementary fig. S2, Supplementary Material 
online). Sequencing of unmutated NP plasmid revealed a 
low rate of errors, which were almost exclusively single- 
nucleotide changes (fig. 3). As expected, the plasmid 
mutant libraries contained a high frequency of single and 
multinucleotide codon mutations (fig. 3). Mutation frequen- 
cies for unmutated RNA or viruses created from unmutated 
NP plasmid were only slightly above the sequencing error rate 
(fig. 3), indicating that reverse transcription and viral replica- 
tion introduced few mutations relative to the targeted mu- 
tagenesis in the plasmid libraries. Mutation frequencies were 
reduced in the mutant viruses relative to the mutant plasmids 
used to create these viruses, particularly for nonsynonymous 
and stop-codon mutations (fig. 3) — consistent with selection 
purging deleterious mutations. These results indicate that 
the deep mutational scanning experiment successfully intro- 
duced many of the NP variants in the plasmid mutant 
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Fic. 2. Design of the deep mutational scanning experiment. The sequenced samples are in yellow. Blue text indicates sources of mutation and selection; 
red text indicates sources of errors. The comparison of interest is between the mutation frequencies in the mutDNA and mutvirus samples, because 
changes in frequencies between these samples represent the action of selection. However, because some of the experimental techniques have the 
potential to introduce errors, the other samples are also sequenced to quantify these unintended sources of error. Each of the two experimental 
replicates (replicates A and B) involved independently repeating the entire viral rescue, viral passaging and sequencing process for each of the four 
plasmid mutant libraries (WT-1, WT-2, N334H-1, and N334H-2). 



libraries into mutant viruses, which were then subjected to 
purifying selection against mutations that interfered with viral 
replication. 

A key question is the extent to which the possible muta- 
tions were sampled in both the plasmid mutant libraries and 
the mutant viruses created from these plasmids. The deep 
mutational scanning would not achieve its goal if only a small 
fraction of possible mutations are sampled by the mutant 
plasmids or by the mutant viruses created from these plas- 
mids (the latter might be the case if there is a bottleneck 
during virus creation, such that all viruses are generated 
from only a few plasmids). Fortunately, figure 4 shows that 
the sampling of mutations was quite extensive in both the 
mutant plasmids and the mutant viruses. Specifically, figure 4 
suggests that for each replicate, nearly all codon mutations 
were sampled numerous times in the plasmid mutant librar- 
ies and that over 75% of codon mutations were sampled by 
the mutant viruses. Figure 4 also suggests that replicate A was 
technically superior to replicate B in the thoroughness with 
which mutations were sampled by the mutant viruses. 
Because most amino acids are encoded by multiple codons, 
the fraction of amino acid mutations sampled in each repli- 
cate is even higher than the >75% of sampled codon muta- 
tions. So although the experiments may not have exhaustively 
examined every possible codon mutation, the thoroughness 
of sampling is certainly sufficient to make the sort of statistical 
inferences about mutational effects that are necessary to 
construct a quantitative evolutionary model. 

Inference of Site-Specific Amino Acid Preferences 
Qualitatively, it is obvious that changes in mutation frequen- 
cies between the plasmid mutant libraries and the resulting 
mutant viruses reflect selection. However, it is less obvious 
how to quantitatively analyze this information. Selection 
acts on the full genomes of all viruses in the population. 



In contrast, the experiments only measure site-independent 
mutation frequencies averaged over the population. Here, 
I have analyzed this data by assuming that each site has an 
inherent preference for each possible amino acid. The moti- 
vation for envisioning site heterogenous but site-independent 
amino acid preferences comes from experiments suggesting 
that the dominant constraint on mutations that fix during 
NP evolution relates to protein stability (Gong et al. 2013) and 
that mutational effects on stability tend to be conserved in a 
site-independent manner (Ashenberg et al. 2013). Because 
the experiments generally examine each mutation in combi- 
nation with several other mutations (the average clone has 
between two and three codon mutations; fig. 1), the site- 
specific amino acid preferences are not simply selection 
coefficients for specific mutations. Instead, they reflect the 
effect of each mutation averaged over a set of genetic 
backgrounds. 

Specifically, let tx,^ denote the preference of site r for 
amino acid a, with ^ a 7zv,a = 1- Figure 3 indicates that 
most observed mutations are the result of the desired 
codon mutagenesis but that there is also a low rate of appar- 
ent mutations arising from lllumina sequencing errors and 
reverse transcription. The expected frequency / rA of mutant 
codon x at site r in the mutant viruses is related to the pref- 
erence Tt r ,A(x) f° r its encoded amino acid A(x) by 
f r x = e r x + p r x + '^'■ xXJI '- A " ,) — where fi riX is the frequency 

2 , y ^'.y XJT r,At,y> 

that site r is mutagenized to codon x in the plasmid mutant 
library, s rjt is the frequency the site is erroneously identified as 
x during sequencing, p riX is the frequency the site is mutated 
to x during reverse transcription, y is summed over all codons, 
and the probability that a site experiences multiple mutations 
or errors in the same clone is taken to be negligibly small. The 
observed codon counts are multinomially distributed around 
these expected frequencies, so by placing a symmetric 
Dirichlet-distribution prior over 7zv, 0 and jointly estimating 
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Fig. 3. Per-codon mutation frequencies for each library (WT-1, WT-2, N334H-1, and N334H-2) in (A) replicate A or (B) replicate B. The samples are 
named as in figure 2. Errors due to lllumina sequencing (DNA sample), reverse transcription (RNA sample), and viral replication (virus-p1 and virus-p2 
samples) are rare and are mostly single-nucleotide changes. The codon-mutant libraries (mutDNA) contain a high frequency of single- and multi- 
nucleotide changes as expected from Sanger sequencing (rightmost bars of this plot and fig. 1; note that Sanger sequencing is not subject to lllumina 
sequencing errors that affect all other samples). Mutations are reduced in mutvirus samples relative to mutDNA plasmids used to create these mutant 
viruses, with most of the reduction in stop-codon and nonsynonymous mutations — as expected if deleterious mutations are purged by purifying 
selection. Details of the analysis used to generate these figures are at http://jbloom.github.io/mapmuts/example_2013Analysis_lnfluenza_NP_Aichi68. 
html (last accessed May 31, 2014). 



the error (e w and p rA ) and mutation (/x r> J rates from the 
appropriate samples in figure 2, it is possible to infer the 
posterior mean for all amino acid preferences by Markov 
chain Monte Carlo (MCMC, see Materials and Methods). 

A basic check on the consistency of the overall experimen- 
tal and computational approach is to compare the amino 
acid preferences inferred from different replicates or different 
viral passages of the same replicate. Figure 5A and B shows 
that the preferences inferred from the first and second vi- 
ral passages within each replicate are extremely similar, indi- 
cating that most selection occurs during initial viral creation 



and passage and that technical variation (preparation of sam- 
ples, stochasticity in sequencing, etc.) has little impact. 
A more crucial comparison is between the preferences in- 
ferred from the two independent experimental replicates. 
This comparison (fig. 5C) shows that preferences from the 
independent replicates are substantially but less perfectly 
correlated — probably the imperfect correlation is because 
the mutant viruses created by reverse genetics independently 
in each replicate are different incomplete samples of the 
many clones in the plasmid mutant libraries. Nonetheless, 
the substantial correlation between replicates shows that 
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Fig. 4. The completeness with which mutations were sampled in the mutant plasmids and viruses, as assessed by the counts for each multinucleotide 
codon mutation in the combined libraries of (A) replicate A or (B) replicate B. Restricting these plots to multinucleotide codon mutations avoids 
confounding effects from sequencing errors, which typically generate single-nucleotide codon mutations. Very few multinucleotide codon mutations 
are observed more than once in the unmutagenized controls (DNA, RNA, virus-pi, and virus-p2). Nearly all multinucleotide codon mutations are 
observed many times in the mutant plasmid libraries (mutDNA). About half the multinucleotide codon mutations are found at least five times in the 
mutant viruses (mutvirus-pl and mutvirus-p2), indicating that at least half the possible mutations were incorporated into a virus. However, this is only a 
lower bound, because deleterious mutations will be absent from the mutant viruses due to purifying selection. If the analysis is restricted to synonymous 
multinucleotide codon mutations (which are less likely to be deleterious), then over 75% of the possible mutations were incorporated into a virus. This is 
still only a lower bound, because even synonymous mutations are sometimes strongly deleterious to influenza (Marsh et al. 2008). The completeness 
with which amino acid mutations are sampled is higher due to the redundancy of the genetic code. Note that replicate A is superior to replicate B in 
terms of the completeness with which the mutations are sampled by the mutant viruses. Details of the analysis used to generate these figures are at 
http://jbloom.github.io/mapmuts/example_2013Analysis_lnfluenza_NP_Aichi68.html (last accessed May 31, 2014). 



the sampling is sufficient to clearly reveal inherent preferences 
despite these experimental imperfections. Presumably better 
inferences can be made by aggregating data via averaging of 
the preferences from both replicates. Figure 5D shows such 
average preferences from the first passage of both replicates. 
These preferences are consistent with existing knowledge 
about NP function and stability. For example, at the con- 
served residues in NP's RNA binding interface (Ye et al. 
2006), the amino acids found in natural sequences tend to 
be the ones with the highest preferences (table 4). Similarly, 
for mutations that have been experimentally characterized as 
having large effects on NP protein stability (Ashenberg et al. 
2013; Cong et al. 2013), the stabilizing amino acid has the 
higher preference (table 5). 

The Experimentally Determined Evolutionary Model 
The final step is to use the amino acid preferences to estimate 
the fixation probabilities F wy , which can then be combined 
with the mutation rates to create a fully experimentally 
determined evolutionary model. Intuitively, it is obvious 
that the amino acid preferences provide information about 
the fixation probabilities. For instance, it seems reasonable to 



expect that a mutation from x to y at site r will be more likely 
to fix (relatively larger value of F™) if amino acid A(y) is 
preferred to .4(x) at this site (if it^Aiy) > ^r^Ux) ) an d ' ess 
likely to fix if Tt r jUy) < 7t r ^(x)- However, the exact relation- 
ship between the amino acid preferences and the fixation 
probabilities is unclear. A rigorous derivation would require 
knowledge of unknown and probably unmeasurable popula- 
tion-genetics parameters for both the deep mutational 
scanning experiment and the naturally evolving populations 
that gave rise to the sequences being analyzed phylogeneti- 
cally. Instead, I provide two heuristic relationships. Both rela- 
tionships satisfy detailed balance (reversibility), such that 
n r ,A(x) x F r . xy = TT r ,A(y) x F r,yx, meaning that F rAy defines a 
Markov process with 7C r ^(x) proportional to its stationary 
state when all amino acid interchanges are equally probable. 

It is helpful to first consider what the amino acid prefer- 
ences values actually represent. Most NP variants in the deep 
mutational scanning libraries contain multiple mutations, so 
the amino acid preferences represent the mutational effects 
averaged over the nearby genetic neighborhood of the parent 
protein. Therefore, one interpretation is that a preference is 
proportional to the fraction of genetic backgrounds in which 
a mutation is tolerated, such that a mutation from x to y 
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Fic. 5. Amino acid preferences. (A) and (B) Preferences inferred from passages 1 and 2 are similar within each replicate, indicating that most selection 
occurs during initial viral creation and passage and that technical variation is small. (C) Preferences from the two independent replicates are also 
correlated but less perfectly. The increased variation is presumably due to stochasticity during the independent viral creation from plasmids for each 
replicate. (D) Preferences for all sites in NP (the N-terminal Met was not mutagenized) inferred from passage 1 of the combined replicates. Letters' 
heights are proportional to the preference for that amino acid and are colored by hydrophobicity. RSA and secondary structure are overlaid for residues 
in crystal structure. Correlation plots show Pearson's R and P value. Numerical data for (D) are in supplementary file SI, Supplementary Material online. 
The preferences are consistent with existing knowledge about mutations to NP (tables 4 and 5). The computer code used to generate this figure is at 
http://jbloom.github.io/mapmuts/example_2013Analysis_lnfluenza_NP_Aichi68.html (last accessed May 31, 2014). 
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Table 4. 


For Residues Involved in NP's RNA-Binding Groove, the Preferences and Expected Evolutionary Equilibrium Frequencies from the 


Experiments Correlate Well with the Amino Acid Frequencies in Naturally Occurring Sequences. 






Residue 


Frequencies in 


Experimentally Measured 




Expected Equilibrium 




Natural 


Amino Acid Preferences 




Evolutionary Frequencies 




Sequences 






from Experiments 


65 


R (0.83), K (0.17) 


R (0.40), K (0.10), N (0.06) 


R 


(0.58), S (0.07) 


150 


R (1.00) 


R (0.46), K (0.06), P (0.05), L (0.05) 


R 


(0.63), L (0.07) 


152 


R (1.00) 


R (0.52), K (0.07), Q (0.07) 


R 


(0.71) 


156 


R (1.00) 


R (0.52), Q (0.06) 


R 


(0.69), S (0.06) 


174 


R (1.00) 


R (0.58), N (0.06), T (0.05) 


R 


(0.75) 


175 


R (1.00) 


R (0.46), K (0.16) 


R 


(0.66), K (0.08), S (0.05) 


195 


R (1.00) 


R (0.51) 


R 


(0.69) 


199 


R (1.00) 


R (0.44), M (0.08), Y (0.06), V (0.05) 


R 


(0.64), V (0.05) 


213 


R (1.00) 


R (0.51), N (0.06) 


R 


(0.69) 


214 


R f0 72l K CO 281 


K (0.24), H (0.09), R (0.09), Q (0.08), 


R 


(0.19), K (0.17), A (0.09), H (0.07), 






M (0.06), N (0.06), A (0.06), 1 (0.06) 




1 (0.06), L (0.06), Q (0.06) 


221 


R (1.00) 


R (0.46), E (0.07), K (0.07) 


R 


(0.66), L (0.05) 


236 


R (0.94), K (0.06) 


K (0.32), R (0.30) 


R 


(0.51), K (0.18) 


355 


R (1.00) 


R (0.29), L (0.13), K (0.09) 


R 


(0.43), L (0.19) 


357 


K (0.56), Q (0.44) 


K (0.38), E (0.09), N (0.07), Y (0.05) 


K 


(0.31), R (0.09), E (0.08), N (0.06) 


361 


R (1.00) 


R (0.53), V (0.13) 


R 


(0.68), V (0.11) 


391 


R (1.00) 


R (0.59), K (0.09) 


R 


(0.77) 


148 


Y (1.00) 


Y (0.54), I (0.06) 


Y 


(0.44), 1 (0.07), T (0.07), P (0.06), S (0.06) 



Note. — Shown are the 17 residues in the NP RNA-binding groove in Ye et al. (2006). The second column gives the frequencies of amino acids in all 21,108 full-length NP 
sequences from influenza A (excluding bat lineages) in the Influenza Virus Resource as of January 31, 2014. The third column gives the experimentally measured amino acid 
preferences (fig. 5D). The fourth column gives the expected evolutionary equilibrium frequency of the amino acids (fig. 6). Only residues with frequencies or preferences > 0.05 
are listed. In all cases, the most abundant amino acid in the natural sequences has the highest expected evolutionary equilibrium frequency. In 15 of 17 cases, the most abundant 
amino acid in the natural sequences has the highest experimentally measured preference — in the other two cases, the most abundant amino acid in the natural sequences is 
among those with the highest preference. 



Table 5. For Residues Where Mutations Have Previously Been Characterized as Having Large Effects on the Stability of the A/Aichi/2/1968 NP, the 
More Stable Amino Acid Has a Higher Preference and Is Also More Frequent in Actual NP Sequences. 


Residue 


Stability Measurement 


Frequencies in 
Natural Sequences 


Experimentally 
Measured Amino 
Acid Preferences 


Expected Equilibrium 
Evolutionary Frequencies 
from Experiments 


259 
280 
334 
384 


L259S is destabilizing (AT m = -3.9°C) 
V280A is destabilizing (AT m = -3.5°C) 
N334H is stabilizing (AT m = 4.5°C) 
R384G is destabilizing (AT m = -4.8°C) 


L (0.98), S (0.02) 
V (0.89), A (0.10) 
H (0.93), N (0.07) 
R (0.80), C (0.17) 


L (0.23), S (0.04) 
V (0.19), A (0.02) 
H (0.28), N (0.12) 
R (0.22), C (0.04) 


L (0.36), S (0.06) 
V (0.25), A (0.03) 
H (0.23), N (0.10) 
R (0.39), C (0.04) 



Note. — The second column gives the experimentally measured change in melting temperature (AT m ) induced by the mutation to the A/Aichi/2/1968 NP as measured in (Gong 
et al. 2013); these mutational effects on stability are largely conserved in other NPs (Ashenberg et al. 2013). The third column gives the frequencies of the amino acids in all 
21,108 full-length NP sequences from influenza A (excluding bat lineages) in the Influenza Virus Resource as of January 31, 2014. The fourth column gives the experimentally 
measured amino acid preferences (fig. 5D). The fifth column gives the expected evolutionary equilibrium frequency of the amino acids (fig 6). 



is always tolerated if JT r ^ M > n r ^ x ) but is only sometimes 
tolerated if 7t r> Mx) < 1T r,A(y)- ' n this interpretation, there 
should be strong selection during initial viral growth depend- 
ing on whether the mutation is tolerated in the particular 
genetic background in which it occurs, and then there 
should be little further enrichment or depletion during 
subsequent viral passages — loosely consistent with fig. 5A,B, 
which shows that the amino acid preferences inferred af- 
ter two viral passages are very similar to those inferred after 
one passage. Note that this interpretation can be related 
to the selection-threshold evolutionary dynamics described 
in Bloom et al. (2007). An equation that describes this 
scenario is 



1 if ^r,A(y) - ^Ax) 



otherwise. 



(3) 



This equation is equivalent to the Metropolis acceptance 
criterion (Metropolis et al. 1953). 

An alternative interpretation is that tt^a( X ) reflects the 
selection coefficient for the amino acid A(x) at site r. In 
this case, if the it^ a values represent the expected amino 
acid equilibrium frequencies in a hypothetical evolving pop- 
ulation in which all amino acid interchanges are equally likely, 
and assuming (probably un realistically) that this hypothetical 
population and the actual population in which NP evolves are 
in the weak-mutation limit (i.e., the population is mostly 
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homogenous, see Desai and Fisher 2007) and have identical 
constant effective population sizes, then Halpern and Bruno 
(1998) derive 



r.AM 
r.Aty) 



r «vt(x) — ^r,A(y) 



otherwise. 



(4) 



Given one of these definitions for the fixation probabilities 
and the mutation rates defined by equation (2), the experi- 
mentally determined evolutionary model is defined by equa- 
tion (1). For the mutation rates and fixation probabilities used 
here, this evolutionary model defines a stochastic process 
with a unique stationary state for each site r. These stationary 
states give the expected amino acid frequencies at evolution- 
ary equilibrium. These evolutionary equilibrium frequencies 
are shown in figure 6 and are somewhat different than the 
amino acid preferences because they also depend on the 
structure of the genetic code (and the mutation rates when 
these are nonsymmetric). For example, if arginine and lysine 
have equal preferences at a site, arginine will be more evolu- 
tionary abundant because it has more codons. 

Phylogenetic Analyses 

The experimentally determined evolutionary model can be 
used to compute phylogenetic likelihoods, thereby enabling 
its comparison to existing models. To perform these compar- 
isons, I first used codonPhyML (Gil et al. 2013) to infer max- 
imum-likelihood trees (fig. 7) for NP sequences from human 
influenza using the Goldman-Yang (GY94) (Goldman and 
Yang 1994) and the Kosiol et al. (2007, KOSI07+ F) codon 
substitution models. These tree topologies were then fixed, 
and the branch lengths and model parameters were opti- 
mized by maximum likelihood for each of the models. 

These models differ in their number of free parameters. 
A "free parameter" is any variable with a value that is deter- 
mined from the same naturally occurring NP sequences that 
are being analyzed phylogenetically. The experimentally de- 
termined evolutionary model has no free parameters, because 
all of the properties of this model were determined by exper- 
iments that did not utilize information from naturally occur- 
ring NP sequences (the amino acid preferences are inferred 
from the experiments using a symmetric prior, so in the ab- 
sence of experimental data all 20 amino acids would be in- 
ferred as equally preferable at each site). Similarly, although 
the KOSI07 + F model has a large number of exchangeability 
variables that were determined empirically, these variables are 
not free parameters because they were specified ahead of 
time from analysis of a general set of gene homologs that 
did not include NP. However, both GY94 and KOSI07 + F also 
contain free parameters that are estimated from the NP 
sequences that are being analyzed phylogenetically. In the 
simplest form, GY94 contains 11 such free parameters (nine 
equilibrium frequencies plus transition-transversion and 
synonymous-nonsynonymous ratios), whereas KOSI07 + F 
contains 62 parameters (60 frequencies plus transition- 



transversion and synonymous-nonsynonymous ratios). 
More complex variants add parameters allowing variation 
in substitution rate (Yang 1994) or synonymous-nonsynon- 
ymous ratio among sites or lineages (Yang and Nielsen 1998; 
Yang et al. 2000). For all these models, HYPHY (Pond et al. 
2005) was used to calculate the likelihood after optimizing 
branch lengths and model parameters on the fixed tree 
topologies. 

Comparison of these likelihoods strikingly validates the su- 
periority of the experimentally determined model (tables 6 
and 7). Adding free parameters generally improves a model's 
fit to data, and this is true within GY94 and KOSI07 + F. 
However, the parameter-free experimentally determined evo- 
lutionary model describes the sequence phylogeny with a 
likelihood far greater than even the most highly parameter- 
ized GY94 and KOSI07 + F variants. Interpreting the amino 
acid preferences as the fraction of genetic backgrounds that 
tolerate a mutation (eq. 3) outperforms interpreting them as 
selection coefficients (eq. 4), although either interpretation 
yields evolutionary models for NP far superior to GY94 or 
KOSI07 + F. Comparison using Akaike information content 
(AIC) to penalize parameters (Posada and Buckley 2004) even 
more emphatically highlights the superiority of the experi- 
mentally determined models. 

There is also a clear correlation between the quality and 
volume of experimental data and the phylogenetic fit: Models 
from individual experimental replicates give lower likelihoods 
than both replicates combined, and the technically superior 
replicate A (recall the comparison in fig. 4) gives a better 
likelihood than replicate B (tables 6 and 7). This fact suggests 
that improvements in experimental methodology that 
improve the accuracy of the measured mutational effects 
should lead to even better experimentally determined evolu- 
tionary models. 

In tables 6 and 7, the site-specific experimentally deter- 
mined model is compared with variants of two general 
models (GY94 and KOSI07 + F) that apply broadly to all pro- 
teins. More recently, it has become possible to estimate non- 
site-specific (identical across sites) codon and amino acid 
models using naturally occurring sequences from specific pro- 
teins or viruses (Dang et al. 2010; De Maio et al. 2013). One 
could therefore ask if the experimentally determined model is 
superior because it is site specific or simply because it is 
experimentally derived from deep mutational scanning of 
influenza. To address this question, I created "randomized" 
experimentally determined models in which the deep muta- 
tional scanning data were randomly shuffled among protein 
sites. These randomized models are still derived from deep 
mutational scanning of influenza but have lost their linkage 
to site-specific experimental information. These random- 
ized models are greatly inferior to all of the other models 
considered here (tables 6 and 7). Therefore, the superiority 
of the experimentally determined model is due to its utili- 
zation of site-specific information from the deep muta- 
tional scanning — if this site specificity is lost, the model 
becomes far worse than general models such as GY94 or 
KOSI07 + F. 
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Fig. 6. The expected frequencies of the amino acids at evolutionary equilibrium using the experimentally determined evolutionary model from passage 
1 of the combined replicates and equation (3) for the fixation probabilities. Note that these expected frequencies are slightly different than the amino 
acid preferences in figure 5D due to the structure of the genetic code. For instance, when arginine and lysine have equal preferences at a site, arginine 
will tend to have a higher evolutionary equilibrium frequency because it is encoded by more codons. The numerical data are in supplementary file S2, 
Supplementary Material online. The computer code used to generate this plot is at http://jbloom.github.io/phyloExpCM/example_2013Analysis_ 
lnfluenza_NP_Human_1918_Descended.html (last accessed May 31, 2014). 



Discussion 

These results establish that an experimentally determined 
evolutionary model is far superior to existing models for 
describing the phylogeny of NP gene sequences. The 
extent of this superiority is striking. The parameter-free evo- 
lutionary model dramatically outperforms even the most 
highly parameterized existing models using the parameter- 
penalizing metric of AIC — but more remarkably, it also 
outperforms these parameterized models by over 400 



log-likelihood units even in the absence of parameter penal- 
ization (table 6). The reason for this superiority is easy to 
understand: Proteins have strong and fairly conserved pref- 
erences for specific amino acids at different sites (Ashenberg 
et al. 2013), but these site-specific preferences are ignored by 
most existing phylogenetic models. Inspection of the over- 
laid bars in figure 5D illustrates the inadequacy of trying to 
capture these preferences simply by classifying sites based 
on gross features of protein structure (Thome et al. 1996; 
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Fic. 7. Phylogenetic tree of NPs from human influenza descended from a close relative of the 1918 virus. Black: H1N1 from 1918 lineage; green: seasonal 
H1N1; red: H2N2; blue: H3N2. Maximum-likelihood trees constructed using codonPhyML (Gil et al. 2013) with (A) the GY94 substitution model or (B) 
the KOSI07 + F substitution model. Up to three NP sequences per year from each subtype were used to build the tree. The A/Aichi/2/1968 NP that was 
the subject of this experiment was not one of the NP sequences randomly subsampled for the tree, so its name is indicated close to a nearly identical 
sequence that is shown in the tree. The computer code used to generate this tree is at http://jbloom.github.io/phyloExpCM/example_2013Analysis_ 
lnfluenza_NP_Human_1918_Descended.html (last accessed May 31, 2014). 



Goldman et al. 1998) — the site-specific amino acid prefer- 
ences are not simply related to secondary structure or 
solvent accessibility. The complexity of the preferences in 
figure 5D also show the limitations of attempting to infer 
amino acid preference parameters for a small number of site 
classes from sequence data (Lartillot and Philippe 2004; Le 
et al. 2008; Wang et al. 2008; Wu et al. 201 3), as it is clear that 
each site is unique. Direct experimental measurement there- 
fore represents a highly attractive method for determining 
the idiosyncratic constraints that affect the evolution of 
each site in a gene. 

Another appealing aspect of an experimentally determined 
evolutionary model is interpretability. A frustrating aspect of 
existing evolutionary models is the inability to interpret many 
of their free parameters directly in evolutionary or molecular 
terms. For example, the equilibrium frequency parameters 
used by most existing models reflect some unknown combi- 
nation of mutational bias and selection for specific codons or 
amino acids — but the relative contributions of these factors 
in determining the parameter values is unclear. On the other 
hand, all aspects of the experimentally determined evolution- 
ary model can be related to direct measurements, making 
them more amenable to direct interpretation. So even if 



such a model were eventually augmented with a few free 
parameters, this could be done in a way that allowed these 
parameters to retain a clear connection to the molecular 
processes of biology and evolution. 

The results presented here also demonstrate that phylo- 
genetic evolutionary models can be greatly improved while re- 
taining the assumption of independence of sites. Phylogenetic 
evolutionary models make two assumptions that are egre- 
giously bad from the perspective of the protein chemist: First, 
these models assume that sites are identical (or at least can be 
described by a small number of classes), and second, they 
assume that sites are independent. The experimentally deter- 
mined model eliminates the first assumption but does noth- 
ing to relax the second. Is this model therefore inconsistent 
with the idea that epistasis is common during protein evolu- 
tion (Lunzer et al. 2010)? In fact, experiments show that a 
general conservation of site-specific amino acid preferences is 
entirely consistent with epistasis. For instance, there is known 
epistasis among some of the mutations fixed along the NP 
phylogenetic tree analyzed here (Gong et al. 2013) — but the 
site-specific compatibilities of amino acids with the protein's 
structural stability are largely conserved among homologs on 
this tree, even for sites involved in epistatic interactions 
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Table 6. Likelihoods Computed Using Various Evolutionary Models After Optimizing the Branch Lengths for the Fixed Tree Topology Inferred 
Using the CY94 model (fig. 7). 



Model AAIC Log Likelihood Parameters 

(Optimized + Empirical) 



Experimental, combined replicates 


0.0 


-12,338.9 


0 (0 + 0) 


Experimental, replicate A 


67.9 


-12,372.8 


0 (0 + 0) 


Experimental, replicate B 


106.1 


-12,392.0 


0 (0 + 0) 


Halpern and Bruno, combined replicates 


357.9 


-12,517.9 


0 (0 + 0) 


Halpern and Bruno, replicate A 


393.0 


-12,535.4 


0 (0 + 0) 


Halpern and Bruno, replicate B 


455.5 


-12,566.7 


0 (0 + 0) 


CY94, beta to plus positive, one rate (M8) 


1,136.8 


-12,893.3 


14 (5 + 9) 


CY94, three-category to, one rate (M2a) 


1,209.5 


-12,929.7 


14 (5 + 9) 


CY94, gamma to, one rate (M5) 


1,218.0 


-12,935.9 


12 (3 + 9) 


CY94, one to, gamma rates 


1,485.7 


-13,069.8 


12 (3 + 9) 


KOSI07 + F, three-category to, one rate (M2a) 


1,679.7 


-13,113.8 


65 (5 + 60) 


KOSI07 + F, M8 rates-one 


1,680.5 


-13,114.1 


65 (5 + 60) 


CY94, one to, one rate (MO) 


1,754.1 


-13,205.0 


11 (2 + 9) 


KOSI07 + F, gamma to, one rate 


1,757.7 


- 13,154.8 


63 (3 + 60) 


KOSI07 + F, one to, gamma rates 


1,831.1 


-13,191.5 


63 (3 + 60) 


CY94, branch-specific to, gamma rates (M5) 


1,972.3 


-12,769.1 


556 (547 + 9) 


KOSI07 + F, one to, one rate (MO) 


2,254.2 


-13,404.0 


62 (2 + 60) 


KOSI07 + F, branch-specific to, gamma rates 


2,319.5 


-12,891.7 


607 (547 + 60) 


Randomized experimental, combined replicates 


3,741.0 


-14,209.4 


0 (0 + 0) 


Randomized experimental, replicate A 


3,809.6 


-14,243.7 


0 (0 + 0) 


Randomized experimental, replicate B 


3,840.4 


-14,259.1 


0 (0 + 0) 


Randomized Halpern and Bruno, combined replicates 


4,388.7 


-14,533.3 


0 (0 + 0) 


Randomized Halpern and Bruno, replicate B 


4,559.1 


-14,618.5 


0 (0 + 0) 


Randomized Halpern and Bruno, replicate A 


4,622.1 


-14,649.9 


0 (0 + 0) 



Note. — Experimentally determined models vastly outperform GY94 or KOSI07 + F. Models are sorted by AAIC (Posada and Buckley 2004) but note that the experimentally 
determined models all have much higher log likelihoods even before penalizing parameters. The experimentally determined models fit best if the amino acid preferences are 
interpreted as the fraction of genetic backgrounds that tolerate a mutation (eq. 3) rather than as selection coefficients (eq. 4). Randomizing the experimentally determined 
preferences among sites makes the models far worse. All variants of CY94 and KOSI07 + F contain empirical equilibrium frequencies plus a transition-transversion ratio and 
synonymous-nonsynonymous ratio (co) optimized by likelihood. Some variants allow co to vary across sites using discrete categories (M2a), a gamma distribution (M5), or a beta 
distribution plus a category (M8). Some variants allow a different w for each branch. Some variants allow the rate of substitution to be gamma distributed. 



(Ashenberg et al. 2013). The reason is that evolutionary rele- 
vant epistasis can arise from subtle and transient fluctuations 
in properties such as protein stability, whereas the phyloge- 
netic improvements from a site-specific model probably 
come mostly from capturing basic information about the 
compatibility of amino acids with a protein's evolutionarily 
conserved structure. Models that assume independence 
among sites can therefore still lead to major improvements 
if the site-specific amino acid preferences are accurately 
represented. 

The major drawback of the experimentally determined 
evolutionary model is its lack of generality. Although this 
model is clearly superior for influenza NP, it is entirely unsui- 
table for other genes. At first blush, it might seem that the 
arduous experiments described here provide data that is un- 
likely to ever become available for most situations of interest. 
However, it is worth remembering that today's arduous ex- 
periment frequently becomes routine in a few years. For 
example, the very gene sequences that are the subjects of 
molecular phylogenetics were once rare pieces of data — 
now such sequences are so abundant that they easily over- 
whelm modern computers. The experimental ease of the 
deep mutational scanning approach used here is on a com- 
parable trajectory: Similar approaches have already been 



applied to several proteins (Fowler et al. 2010; Traxlmayr 
et al. 2012; Melamed et al. 2013; Roscoe et al. 2013; Starita 
et al. 2013), and there continue to be rapid improvements in 
techniques for mutagenesis (Firnberg and Ostermeier 2012; 
Jain and Varadarajan 2014) and sequencing (Hiatt et al. 2010; 
Schmitt et al. 2012; Lou et al. 2013). Given these prospects for 
technical improvements in deep mutational scanning, it is 
therefore especially encouraging that the phylogenetic fit of 
the NP evolutionary model improves with the quality and vol- 
ume of experimental data from which it is derived (table 6). 
The increasing availability of similar high-throughput data 
for a vast range of proteins has the potential to transform 
phylogenetic analyses by greatly increasing the accuracy 
of evolutionary models, while at the same time replacing a 
plethora of free parameters with experimentally measured 
quantities that can be given clear biological and evolutionary 
interpretations. 

Materials and Methods 

Availability of Data and Computer Code 
lllumina sequencing data are available at the Sequence Read 
Archive (SRA) (accession SRP036064, http://www.ncbi.nlm. 
nih.gov/sra/?term=SRP036064, last accessed May 31, 2014). 
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Table 7. Likelihoods for the Various Evolutionary Models for the Tree Topology Inferred with CodonPhyML Using KOSI07+F. 

Model AAIC Log Likelihood Parameters 

(Optimized + Empirical) 



Experimental, combined replicates 


0.0 


— 1 2,334.6 


0 


(o 


+ 


0) 


Experimental, replicate A 


67.9 


— 1 2,368.5 


0 


(o 


+ 


0) 


Experimental, replicate B 


106.2 


-12,387.7 


0 


(o 


+ 


0) 


Halpern and Bruno, combined replicates 


356.8 


-12,513.0 


0 


(o 


+ 


0) 


Halpern and Bruno, replicate A 


391.5 


-12,530.3 


0 


(o 


+ 


o) 


Halpern and Bruno, replicate B 


454.8 


-12,562.0 


0 


(o 


+ 


0) 


CY94, beta tu plus positive, one rate (M8) 


1,183.4 


-12,912.3 


14 


(5 


+ 


9) 


GY94, three-category oj, one rate (M2a) 


1,209.4 


-12,925.3 


14 


(5 


+ 


9) 


CY94, gamma cu, one rate (M5) 


1,219.6 


-12,932.4 


12 


(3 


+ 


9) 


CY94, one to, gamma rates 


1,493.1 


-13,069.1 


12 


(3 


+ 


9) 


KOSI07 + F, three-category <o, one rate (M2a) 


1,676.0 


-13,107.6 


65 


(5 


+ 


60) 


KOSI07 + F, M8 rates-one 


1,676.6 


-13,107.9 


65 


(5 


+ 


60) 


KOSI07 + F, gamma m, one rate 


1,753.3 


-13,148.2 


63 


(3 


+ 


60) 


CY94, one o, one rate (MO) 


1,762.2 


-13,204.7 


11 


(2 


+ 


9) 


KOSI07 + F, one oj, gamma rates 


1,834.3 


-13,188.7 


63 


(3 


+ 


60) 


CY94, branch-specific <o, gamma rates (M5) 


1,980.8 


-12,769.0 


556 


(547 


+ 9) 


KOSI07 + F, one m, one rate (MO) 


2,256.8 


-13,401.0 


62 


(2 


+ 


60) 


KOSI07 + F, branch-specific oj, gamma rates 


2,324.0 


-12,889.6 


607 


(547 


+ 60) 


Randomized experimental, combined replicates 


3,741.3 


-14,205.2 


0 


(0 


+ 


0) 


Randomized experimental, replicate A 


3,809.4 


-14,239.3 


0 


(o 


+ 


0) 


Randomized experimental, replicate B 


3,841.4 


-14,255.3 


0 


(o 


+ 


0) 


Randomized Halpern and Bruno, combined replicates 


4,387.6 


-14,528.4 


0 


(o 


+ 


0) 


Randomized Halpern and Bruno, replicate B 


4,557.9 


-14,613.6 


0 


(o 


+ 


0) 


Randomized Halpern and Bruno, replicate A 


4,620.8 


-14,645.0 


0 


(o 


+ 


o) 


Note. — This table differs from table 6 in that it optimizes the likelihoods 


on the tree topology inferred with KOSI07 H 


- F rather than GY94. 







A description and links to the source code used to analyze the 
sequencing data and infer the amino acid preferences is at 
http://jbloom.github.io/mapmuts/example_2013Analysis_ 
lnfluenza_NP_Aichi68.html (last accessed May 31, 2014). A 
description and links to the source code used for the phylo- 
genetic analyses is at http://jbloom.github.io/phyloExpCM/ 
example_2013Analysis_lnfluenza_NP_Human_1918_ 
Descended.html (last accessed May 31, 2014). 

Experimental Measurement of Mutation Rates 
To measure mutation rates, I generated GFP-carrying viruses 
with all genes derived from A/WSN/1933 (H1N1) by reverse 
genetics as described previously (Bloom et al. 2010). These 
viruses were repeatedly passaged at limiting dilution in 
MDCK-SIAT1-CMV-PB1 cells (Bloom et al. 2010) using low 
serum media (Opti-MEM I with 0.5% heat-inactivated fetal 
bovine serum, 0.3% BSA, 100 U/ml penicillin, 100 Lig/ml strep- 
tomycin, and 100(.ig/ml calcium chloride) — a moderate 
serum concentration was retained and no trypsin was 
added because viruses with the WSN HA and NA are trypsin 
independent (Goto and Kawaoka 1998). These passages were 
performed for 27 replicate populations. For each passage, 
100 uJ containing the equivalent of 2ul of virus collection 
was added to the first row of a 96-well plate. The virus was 
serially diluted 1:5 down the plate, such that at the conclusion 
of the dilutions, each well contained 80 uJ of virus dilution. 
MDCK-SIAT1-CMV-PB1 cells were then added to each well in 



a 50j.il volume containing 2.5 x 10 3 cells. The plates were 
grown for approximately 80 h, and wells were examined for 
cytopathic effect indicative of viral growth. The last well with 
cytopathic effect was collected and used as the parent pop- 
ulation for the next round of limiting-dilution passage. 

After 25 limiting-dilution passages, 10 of the 27 viral pop- 
ulations no longer caused any visible GFP expression in the 
cells in which they caused cytopathic effect, indicating fixa- 
tion of a mutation that ablated GFP fluorescence. The 17 
remaining populations all caused fluorescence in infected 
cells, although in some cases the intensity was visibly 
reduced — these populations therefore must have retained 
at least a partially functional GFP. Total RNA was purified 
from each viral population, the PB1 segment was reverse 
transcribed using the primers CATGATCGTCTCGTATTAGTAG 
AAACAAGGCATTTTTTCATGAAGGACAAGC and CATGATCGTC 
tcagggagcgaaagcaggcaaaccatttgattgg, and the re- 
verse-transcribed cDNA was amplified by conventional PCR 
using the same primers. For 22 of the 27 replicate viral pop- 
ulations, this process amplified an insert with the length ex- 
pected for the full GFP-carrying PB1 segment. For two of the 
replicates, this amplified inserts between 0.4 and 0.5 kb 
shorter than the expected length, suggesting an internal de- 
letion in part of the segment. For three replicates, this failed to 
amplify any insert, suggesting total loss of the GFP-carrying 
PB1 segment, a very large internal deletion, or rearrangement 
that rendered the reverse-transcription primers ineffective. 
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For the 24 replicates from which an insert could be amplified, 
the GFP coding region was Sanger sequenced to determine 
the consensus sequence. The results are in tables 1 and 2. 

To estimate R m ^ n , it is necessary to normalize by the nu- 
cleotide composition of the GFP gene. The numbers of each 
nucleotide in this gene are N A = 175, N T = 103, N c = 241, 
and N c = 201. Given that the counts in table 2 come after 25 
passages of 24 replicates: 

1 N m ^„ + ry^ nc + 1 

(5) 

where N m ^„ is the number of observed mutations from m to 
n in table 2, m c indicates the complement of DNA nucleotide 
m (e.g., A c = T). The one in the numerator is a pseudocount 
added to the observed counts of each type of mutation to 
avoid estimating rates of zero. The values of R m ^ n estimated 
from equation (5) give the probability that a nucleotide that 
is already m will mutate to n in a single tissue-culture 
generation. 

Construction of NP Codon-Mutant Libraries 

The goal was to construct a mutant library with an average of 
two to three random codon mutations per gene. Most tech- 
niques for creating mutant libraries of full-length genes, such 
as error-prone PCR (Cirino et al. 2003) and chemical muta- 
genesis (Neylon 2004), introduce mutations at the nucleotide 
level, meaning that codon substitutions involving multiple 
nucleotide changes occur at a negligible rate. Recently, several 
groups have developed strategies for introducing codon mu- 
tations along the lengths of entire genes (Firnberg and 
Ostermeier 2012; Jain and Varadarajan 2014, Kitzman J and 
Shendure J, personal communication). Most of these strate- 
gies are designed to create exactly one codon mutation per 
gene. For my experiments, it was desirable to introduce a 
distribution of around one to four codon mutations per 
gene to examine the average effects of mutations in a variety 
of closely related genetic backgrounds. Therefore, I devised a 
codon-mutagenesis protocol specifically for this purpose. 

This technique involved iterative rounds of low-cycle PCR 
with pools of mutagenic synthetic oligonucleotides that each 
contain a randomized NNN triplet at a specific codon site. 
Two replicate libraries each of the WT and N334H variants 
of the Aichi/1968 NP were prepared in full biological dupli- 
cate, beginning each with independent preps of the plasmid 
templates pHWAichi68-NP and pHWAichi68-NP-N334H. 
The sequences of the NP genes in these plasmids are provided 
in Gong et al. (2013). To avoid cross-contamination, all puri- 
fication steps used an independent gel for each sample, with 
the relevant equipment thoroughly washed to remove resid- 
ual DNA. 

First, for each codon except for that encoding the initiating 
methionine in the 498-residue NP gene, I designed an oligo- 
nucleotide that contained a randomized Knxnxr nucleotide trip- 
let preceded by the 16 nucleotides upstream of that codon in 
the NP gene and followed by the 16 nucleotides downstream 
of that codon in the NP gene. I ordered these oligonucleotides 



in 96-well plate format from Integrated DNA Technologies 
and combined them in equimolar quantities to create the 
forward-mutagenesis primer pool. I also designed and ordered 
the reverse complement of each of these oligonucleotides and 
combined them in equimolar quantities to create the reverse- 
mutagenesis pool. The primers for the N334H variants dif- 
fered only for those that overlapped the N334H codon. I also 
designed end primers that annealed to the termini of the NP 
sequence and contained sites appropriate for BsmBI cloning 
into the influenza reverse-genetics plasmid pHW2000 
(Hoffmann et al. 2000). These primers are 5'-BsmBI-Aichi68- 
NP (catgatcgtctcagggagcaaaagcagggtagataat- 
cactcacag) and 3'-BsmBI-Aichi68-NP (catgatcgtctcg 
tattagtagaaacaagggtatttttcttta). 

I set up PCR reactions that contained 1 (xl of 10ng/ul 
template pHWAichi68-NP plasmid (Gong et al. 2013), 25 uJ 
of 2x KOD Hot Start Master Mix (product number 71842, 
EMD Millipore), 1.5 ul each of 10 uM solutions of the end 
primers 5'-BsmBI-Aichi68-NP and 3'-BsmBI-Aichi68-NP, and 
21 uJ of water. I used the following PCR program (referred to 
as the amplicon PCR program in the remainder of this article): 

1) 95°Cfor2min. 

2) 95°Cfor20s. 

3) 70°Cfor 1s. 

4) 50 °C for 30 s cooling to 50 °C at 0.5°C/s. 

5) 70°Cfor40s. 

6) Repeat steps 2 through 5 for 24 additional cycles. 

7) Hold4°C 

The PCR products were purified over agarose gels using 
ZymoClean columns (product number D4002, Zymo 
Research) and used as templates for the initial codon muta- 
genesis fragment PCR. 

Two fragment PCR reactions were run for each template. 
The forward-fragment reactions contained 15 uJ of 2x KOD 
Hot Start Master Mix, 2 ul of the forward mutagenesis primer 
pool at a total oligonucleotide concentration of 4.5 uM, 2 ul 
of 4.5 uM 3'-BsmBI-Aichi68-NP, 4 ul of 3 ng/uJ of the afore- 
mentioned gel-purified linear PCR product template, and 7 ul 
of water. The reverse-fragment reactions were identical 
except that the reverse mutagenesis pool was substituted 
for the forward mutagenesis pool and that 5'-BsmBI- 
Aichi68-NP was substituted for 3'-BsmBI-Aichi68-NP. The 
PCR program for these fragment reactions was identical to 
the amplicon PCR program except that it utilized a total of 7 
rather than 25 thermal cycles. 

The products from the fragment PCR reactions were di- 
luted 1:4 in water. These dilutions were then used for the 
joining PCR reactions, which contained 15 uJ of 2x KOD 
Hot Start Master Mix, 4uJ of the 1:4 dilution of the 
forward-fragment reaction, 4ul of the 1:4 dilution of the re- 
verse-fragment reaction, 2uJ of 4.5 uM 5'-BsmBI-Aichi68-NP, 
2 u.1 of 4.5 u.M 3'-BsmBI-Aichi68-NP, and 3 uJ of water. The 
PCR program for these joining reactions was identical to the 
amplicon PCR program except that it utilized a total of 
20 rather than 25 thermal cycles. The products from these 
joining PCRs were purified over agarose gels. 
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The purified products of the first joining PCR reactions 
were used as templates for a second round of fragment reac- 
tions followed by joining PCRs. These second-round products 
were used as templates for a third round. The third-round 
products were purified over agarose gels, digested with BsmBI 
(product number R0580L, New England Biolabs), and ligated 
into a dephosphorylated (Antarctic Phosphatase, product 
number M0289L, New England Biolabs) BsmBI digest of 
pHW2000 (Hoffmann et al. 2000) using T4 DNA ligase. The 
ligations were purified using ZymoClean columns, electropo- 
rated into ElectroMAX DH10B T1 phage-resistant competent 
cells (product number 12033-015, Invitrogen), and plated 
on LB plates supplemented with 100ug/ml of ampicillin. 
These transformations yielded between 400,000 and 800,000 
unique transformants per plate, as judged by plating a 1:4,000 
dilution of the transformations on a second set of plates. 
Transformation of a parallel no-insert control ligation yielded 
approximately 50-fold fewer colonies, indicating that self 
ligation of pHW2000 only accounts for a small fraction of 
the transformants. For each library, I performed three trans- 
formations, grew the plates overnight, and then scraped 
the colonies into liquid LB supplemented with ampicillin 
and mini-prepped several hours later to yield the plasmid 
mutant libraries. These libraries each contained in excess of 
10 6 unique transformants, most of which will be unique 
codon mutants of the NP gene. 

I sequenced the NP gene for 30 individual clones drawn 
from the four mutant libraries. As shown in figure 1, the 
number of mutations per clone was approximately Poisson 
distributed and the mutations occurred uniformly along the 
primary sequence. If all codon mutations are made with equal 
probability, 9/63 of the mutations should be single-nucleotide 
changes, 27/63 should be two-nucleotide changes, and 27/63 
should be three-nucleotide changes. This is approximately 
what was observed in the Sanger-sequenced clones. The nu- 
cleotide composition of the mutated codons was roughly 
uniform, and there was no tendency for clustering of multiple 
mutations in primary sequence. The results of this Sanger 
sequencing are compatible with the mutation frequencies 
obtained from deep sequencing the "mutDNA" samples 
after subtracting off the sequencing error rate estimated 
from the DNA samples (fig. 3), especially considering that 
the statistics from the Sanger sequencing are subject to sam- 
pling error due to the limited number of clones analyzed. 

Viral Growth and Passage 

Two independent replicates of viral growth and passage were 
performed (replicates A and B). The procedures were similar 
between replicates, but there were a few small differences. 
In the actual experimental chronology, replicate B was 
performed first, and the modifications in replicate A were 
designed to improve the sampling of the mutations by the 
created mutant viruses. These modifications may be the 
reason why replicate A slightly outperforms replicate B 
by two objective measures: The viruses more completely 
sample the codon mutations (fig. 4), and the evolutionary 
model derived solely from replicate A gives a higher likelihood 



than the evolutionary model derived solely from replicate B 
(tables 6 and 7). 

For replicate B, I used reverse genetics to rescue viruses 
carrying the Aichi/1968 NP or one of its derivatives, PB2 and 
PA from the A/Nanchang/933/1995 (H3N2), a PB1 gene seg- 
ment encoding CFP, and HA/NA/M/NS from A/WSN/1933 
(H1N1) strain. With the exception of the variants of NP used, 
these viruses are identical to those described in Gong et al. 
(2013) and were rescued by reverse genetics in 293T-CMV- 
Nan95-PB1 and MDCK-SIAT1-CMV-Nan95-PB1 cells as de- 
scribed in that reference. The previous section describes four 
NP codon-mutant libraries, two of the WT Aichi/1968 gene 
(WT-1 and WT-2) and two of the N334H variant (N334H-1 
and N334H-2). I grew mutant viruses from all four mutant 
libraries and four paired unmutated viruses from indepen- 
dent preps of the parent plasmids. A major goal was to main- 
tain diversity during viral creation by reverse genetics — the 
experiment would obviously be undermined if most of the 
rescued viruses derived from a small number of transfected 
plasmids. I therefore performed the reverse genetics in 15-cm 
tissue culture dishes to maximize the number of transfected 
cells. Specifically, 15 cm dishes were seeded with 10 7 293T- 
CMV-Nan95-PB1 cells in D10 media (DM EM with 10% heat- 
inactivated fetal bovine serum, 2mM L-glutamine, 100U/ml 
penicillin, and 100(ig/ml streptomycin). At 20 h postseeding, 
the dishes were transfected with 2.8 (.ig of each of the eight 
reverse-genetics plasmids. At 20 h posttransfection, about 
20% of the cells expressed GFP (indicating transcription by 
the viral polymerase of the GFP encoded by pHH-PBIflank- 
eGFP), suggesting that many unique cells were transfected. 
At 20 h posttransfection, the media was changed to the low 
serum media described above. At 78 h posttransfection, the 
viral supernatants were collected, clarified by centrifugation 
at 2,000 x g for 5 min, and stored at 4°C The viruses were 
titered by flow cytometry as described previously (Gong et al. 
2013). A control lacking the NP gene yielded no infectious 
virus as expected. 

The virus was then passaged in MDCK-SIAT1-CMV- 
Nan95-PB1 cells. These cells were seeded into 15 cm dishes, 
and when they had reached a density of 10 7 per plate, they 
were infected with 10 s infectious particles (multiplicity of 
infection (MOI) of 0.1) of the transfectant viruses in low 
serum media. After 18 h, 30-50% of the cells were green as 
judged by microscopy, indicating viral spread. At 40 h 
posttransfection, 100% of the cells were green, and many 
showed clear signs of cytopathic effect. At this time, the 
viral supernatants were again collected, clarified, and stored 
at 4°C NP cDNA isolated from these viruses was the source 
the deep-sequencing samples "virus-pi" and "mutvirus-p1" in 
figure 2. The virus was then passaged a second time exactly as 
before (again using an MOI of 0.1). NP cDNA from these 
twice-passaged viruses constituted the source for the samples 
"virus-p2" and "mutvirus-p2" in figure 2. 

For replicate A, all viruses (both the four mutant viruses 
and the paired unmutated controls) were regrown indepen- 
dently from the same plasmid preps used for replicate B. The 
experimental process was identical to that used for replicate B 
except for the following: Standard influenza viruses (rather 



1971 



Bloom ■ doi:10.1093/molbev/msu173 



MBE 



than the GFP-carrying variants) were used, so plasmid 
pHW-Nan95-PB1 (Gong et al. 2013) was substituted for 
pHH-PB1flank-eGFP during reverse genetics, and 293T and 
MDCK-SIAT1 cells were substituted for the PB1 -expressing 
variants. Rather than creating the viruses by transfecting a 
single 15-cm dish, each sample was created by transfecting 
two 12-well dishes, with the dishes seeded at 3 x 10 5 293T 
and 5 x 10 4 MDCK-SIAT1 cells prior to transfection. The pas- 
saging was then done in four 10 cm dishes for each sample, 
with the dishes seeded at 4 x 10 6 MDCK-SIAT1 cells 12-14 h 
prior to infection. The passaging was still done at an MOI of 
0.1. These modifications were designed to increase diversity in 
the viral population. These viruses were titered by TCID50 
rather than flow cytometry. 

Sample Preparation and lllumina Sequencing 
For each sample, a PCR amplicon was created to serve as the 
template for lllumina sequencing. The steps used to generate 
the PCR amplicon for each of the seven sample types (fig. 2) 
are listed below. Once the PCR template was generated, for all 
samples the PCR amplicon was created using the amplicon 
PCR program described above in 50 pi reactions consisting of 
25 uJ of 2x KOD Hot Start Master Mix, 1.5 ul each of 10 pM 
of 5'-BsmBI-Aichi68-NP and 3'-BsmBI-Aichi68-NP, the indi- 
cated template, and ultrapure water. A small amount of 
each PCR reaction was run on an analytical agarose gel to 
confirm the desired band. The remainder was then run on its 
own agarose gel without any ladder (to avoid contamination) 
after carefully cleaning the gel rig and all related equipment. 
The amplicons were excised from the gels, purified over 
ZymoClean columns, and analyzed using a NanoDrop to 
ensure that the absorbance at 260 nm was at least 1.8 times 
that at 230 nm and 280 nm. The templates were as follows: 

• DNA: The templates for these amplicons were 10 ng of the 
unmutated independent plasmid preps used to create the 
codon mutant libraries. 

• mutDNA: The templates for these amplicons were 10 ng of 
the plasmid mutant libraries. 

• RNA: This amplicon quantifies the net error rate of tran- 
scription and reverse transcription. Because the viral RNA 
is initially transcribed from the reverse-genetics plasmids 
by RNA polymerase I, but the bidirectional reverse-genetics 
plasmids direct transcription of RNA by both RNA poly- 
merases I and II (Hoffmann et al. 2000), the RNA templates 
for these amplicons were transcribed from plasmids de- 
rived from pHH21 (Neumann et al. 1999), which only di- 
rects transcription by RNA polymerase I. The unmutated 
WT and N334H NP genes were cloned into this plasmid to 
create pHH-Aichi68-NP and pHH-Aichi68-NP-N334H. 
Independent preparations of these plasmids were trans- 
fected into 293T cells, transfecting 2 ).ig of plasmid into 
5 x 10 5 cells in six-well dishes. After 32 h, total RNA was 
isolated using Qiagen RNeasy columns and treated with 
the Ambion TURBO DNA-free kit (Applied Biosystems 
AM1907) to remove residual plasmid DNA. This RNA 
was used as a template for reverse transcription with 
AccuScript (Agilent 200820) using the primers 5'-BsmBI- 



Aichi68-NP and 3'-BsmBI-Aichi68-NP. The resulting cDNA 
was quantified by quantitative PCR (qPCR) specific for NP 
(see below), which showed high levels of NP cDNA in the 
reverse-transcription reactions but undetectable levels in 
control reactions lacking the reverse transcriptase, indicat- 
ing that residual plasmid DNA had been successfully re- 
moved. A volume of cDNA that contained at least 2 x 10 s 
NP cDNA molecules (as quantified by qPCR) was used as 
template for the amplicon PCR reaction. Control PCR 
reactions using equivalent volumes of template from the 
no reverse-transcriptase control reactions yielded no 
product. 

• virus-p1: This amplicon was derived from virus created 
from the unmutated plasmid and collected at the end of 
the first passage. Clarified virus supernatant was ultracen- 
trifuged at 64,000 x g for 1.5 hat 4 °C, and the supernatant 
was decanted. Total RNA was then isolated from the viral 
pellet using a Qiagen RNeasy kit. This RNA was used as a 
template for reverse transcription with AccuScript using 
the primers 5'-BsmBI-Aichi68-NP and 3'-BsmBI-Aichi68- 
NP. The resulting cDNA was quantified by qPCR, which 
showed high levels of NP cDNA in the reverse-transcrip- 
tion reactions but undetectable levels in control reactions 
lacking the reverse transcriptase. A volume of cDNA that 
contained at least 10 7 NP cDNA molecules (as quantified 
by qPCR) was used as template for the amplicon PCR 
reaction. Control PCR reactions using equivalent volumes 
of template from the no reverse-transcriptase control 
reactions yielded no product. 

• virus-p2, mutvirus-p1, and mutvirus-p2: These amplicons 
were created as for the virus-p1 amplicons but used the 
appropriate virus as the initial template as outlined in 
figure 2. 

An important note: It was found that the use of relatively 
new RNeasy kits with (3-mercaptoethanol (a reducing agent), 
freshly added per the manufacturer's instructions, was neces- 
sary to avoid what appeared to be oxidative damage to 
purified RNA. 

The overall experiment only makes sense if the sequenced 
NP genes derive from a large diversity of initial template mol- 
ecules. Therefore, qPCR was used to quantify the molecules 
produced by reverse transcription to ensure that a suffi- 
ciently large number were used as PCR templates to create 
the amplicons. The qPCR primers were 5'-Aichi68-NP-for 
(gcaacagctggtctgactcaca) and 3'-Aichi68-NP-rev 
(tccatgccggtgcgaacaag). The qPCR reactions were 
performed using the SYBR Green PCR Master Mix (Applied 
Biosystems 4309155) following the manufacturer's instruc- 
tions. Linear NP PCR-ed from the pHWAichi68-NP plasmid 
was used as a quantification standard — the use of a linear 
standard is important, because amplification efficiencies differ 
for linear and circular templates (Hou et al. 2010). The stan- 
dard curves were linear with respect to the amount of NP 
standard over the range from 10 2 to 10 9 NP molecules. These 
standard curves were used to determine the absolute number 
of NP cDNA molecules after reverse transcription. Note that 
the use of only 25 thermal cycles in the amplicon PCR 
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program provides a second check that there are a substantial 
number of template molecules, as this moderate number of 
thermal cycles will not lead to sufficient product if there are 
only a few template molecules. 

To allow the lllumina sequencing inserts to be read in both 
directions by paired-end 50 nt reads (supplementary fig. SI, 
Supplementary Material online), it was necessary to us an 
lllumina library-prep protocol that created NP inserts that 
were roughly 50 nt in length. This was done via a modification 
of the lllumina Nextera protocol. First, concentrations of the 
PCR amplicons were determined using PicoGreen (Invitrogen 
P7859). These amplicons were used as input to the lllumina 
Nextera DNA Sample Preparation kit (lllumina FC-121-1031). 
The manufacturer's protocol for the tagmentation step was 
modified to use 5-fold less input DNA (10 ng rather than 
50 ng) and 2-fold more tagmentation enzyme (10 uJ rather 
than 5 pi), and the incubation at 55 °C was doubled from 5 to 
10min. Samples were barcoded using the Nextera Index Kit 
for 96 indices (lllumina FC-121-1012). For index 1, the barcod- 
ing was DNA with N701, RNA with N702, mutDNA with 
N703, virus-pi with N704, mutvirus-p1 with N705, virus-p2 
with N706, and mutvirus-p2 with N707. After completion of 
the Nextera PCR, the samples were subjected to a ZymoClean 
purification rather than the bead cleanup step specified in the 
Nextera protocol. The size distribution of these purified PCR 
products was analyzed using an Agilent 200 TapeStation 
Instrument. If the NP sequencing insert is exactly 50 nt in 
size, then the product of the Nextera PCR should be 186 nt 
in length after accounting for the addition of the Nextera 
adaptors. The actual size distribution was peaked close to 
this value. The ZymoClean-purified PCR products were quan- 
tified using PicoGreen and combined in equal amounts into 
pools: A WT-1 pool of the seven samples for that library, a 
WT-2 pool of the seven samples for that library, etc. These 
pools were subjected to further size selection by running 
them on a 4% agarose gel versus a custom ladder containing 
171 and 196 nt bands created by PCR from a GFP template 
using the forward primer gcacggggccgtcgccg and the 
reverse primers tggggcacaagctggagtacaac (for the 
171 nt band) and gacttcaaggaggacggcaacatcc (for 
the 196 nt band). The gel slice for the sample pools corre- 
sponding to sizes between 171 and 196 nt was excised and 
purified using a ZymoClean column. A separate clean gel was 
run for each pool to avoid cross contamination. 

Library QC and cluster optimization were performed using 
Agilent Technologies qPCR NGS Library Quantification Kit 
(Agilent Technologies, Santa Clara, CA). Libraries were intro- 
duced onto the flow cell using an lllumina cBot (lllumina, Inc., 
San Diego, CA) and a TruSeq Rapid Duo cBot Sample Loading 
Kit. Cluster generation and deep sequencing was performed 
on an lllumina HiSeq 2500 using an lllumina TruSeq Rapid PE 
Cluster Kit and TruSeq Rapid SBS Kit. A paired-end, 50 nt 
read-length (PE50) sequencing strategy was performed in 
rapid run mode. Image analysis and base calling were per- 
formed using lllumina's Real Time Analysis v1. 17.20.0 soft- 
ware, followed by demultiplexing of indexed reads and 
generation of FASTQ files, using lllumina's CASAVA v1.8.2 
software (http://www.illumina.com/software.ilmn, last 



accessed May 31, 2014). These FASTQ files were uploaded 
to the Sequence Read Archive (SRA) under accession 
SRP036064 (see http://www.ncbi.nlm.nih.gov/sra/?term= 
SRP036064, last accessed May 31, 2014). 

Read Alignment and Quantification of Mutation 
Frequencies 

A custom Python software package, mapmuts, was created to 
quantify the frequencies of mutations from the lllumina se- 
quencing. A description of the software as utilized in this work 
is available at http://jbloom.github.io/mapmuts/example_ 
2013Analysis_lnfluenza_NP_Aichi68.html (last accessed May 
31, 2014). Briefly: 

1) Reads were discarded if either read in a pair failed the 
lllumina chastity filter, had a mean Q-score less than 25, 
or had more than two ambiguous (n) nucleotides. 

2) The remaining paired reads were aligned to each other, 
and retained only if they shared at least 30 nt of overlap, 
disagreed at no more than one site, and matched the 
expected terminal lllumina adaptors with no more than 
one mismatch. 

3) The overlap of the paired reads was aligned to NP, 
disallowing alignments with gaps or more than six nu- 
cleotide mismatches. A small fraction of alignments cor- 
responded exclusively to the noncoding termini of the 
viral RNA; the rest contained portions of the NP coding 
sequence. 

4) For every paired read that aligned with NP, the codon 
identity was called if both reads concurred for all three 
nucleotides in the codon. If the reads disagreed or con- 
tained an ambiguity in that codon, the identity was not 
called. 

Inference of the Amino Acid Preferences 
The approach described here is based on the assumption that 
there is an inherent preference for each amino acid at each 
site in the protein. This assumption is clearly not completely 
accurate, as the effect of a mutation at one site can be influ- 
enced by the identities of other sites. However, experimental 
work with NP (Gong et al. 2013) and other proteins (Serrano 
et al. 1993; Bloom et al. 2005, 2006; Bershtein et al. 2006) 
suggests that at an evolutionary level, sites interact mostly 
through generic effects on stability and folding. Furthermore, 
the effects of mutations on stability and folding tend to be 
conserved during evolution (Serrano et al. 1993; Ashenberg 
et al. 2013). So one justification for assuming site-specific but 
site-independent preferences is that selection on a mutation 
is mostly determined by whether the protein can tolerate its 
effect on stability or folding so stabilizing amino acids will be 
tolerated in most genetic backgrounds, whereas destabilizing 
amino acids will only be tolerated in some backgrounds, as 
has been described experimentally (Gong et al. 2013) and 
theoretically (Bloom et al. 2007). A more pragmatic justifica- 
tion is that the work here builds off this assumption to create 
evolutionary models that are much better than existing 
alternatives. 
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Assume that the preferences are entirely at the amino acid 
level and are indifferent to the specific codon (the study of 
preferences for synonymous codons is an interesting area for 
future work). Denote the preference of site r for amino acid a 
as it, a , where 



(6) 



Define TT r ,a/n r , a > as the expected ratio of amino acid a to a' 
after viral growth if both are initially introduced into the 
mutant library at equal frequency. Mutations that enhance 
viral growth will have larger values of jr^ a , whereas mutations 
that hamper growth will have lower values of Jt^ a . However, 
Jtr.a/Xr^' cannot be simply interpreted as the fitness effect of 
mutating site r from a to a': Because most clones have mul- 
tiple mutations, this ratio summarizes the effect of a mutation 
in a variety of related genetic backgrounds. A mutation can 
therefore have a ratio greater than one due to its inherent 
effect on viral growth or its effect on the tolerance for other 
mutations (or both). This analysis does not separate these 
factors, but experimental work (Cong et al. 2013) has shown 
that it is fairly common for one mutation to NP to alter the 
tolerance to a subsequent one. 

The most naive approach is to set Jt^ a proportional to the 
frequency of amino acid a in mutvirus-p1 divided by its fre- 
quency in mutDNA and then apply the normalization in 
equation (6). However, such an approach is problematic for 
several reasons. First, it fails to account for errors (PCR, reverse 
transcription) that inflate the observed frequencies of some 
mutations. Second, estimating ratios by dividing finite counts 
is notoriously statistically biased (Pearson 1910; Ogliore et al. 
2011). For example, in the limiting case where a mutation is 
counted once in mutvirus-p1 and not at all in mutDNA, the 
ratio is infinity — yet in practice such low counts give little 
confidence that enough variants have been assayed to esti- 
mate the true effect of the mutation. 

To circumvent these problems, I used an approach that 
explicitly accounts for the sampling statistics. The approach 
begins with prior estimates that the ir^ a values are all equal 
and that the error and mutation rates for each site are equal 
to the library averages. Multinomial likelihood functions give 
the probability of observing a set of counts given the it^ a 
values and the various error and mutation rates. The posterior 
mean of the n r a values is estimated by MCMC 

Use the counts in DNA to quantify errors due to PCR and 
sequencing. Use the counts in RNA to quantify errors due to 
reverse transcription. Assume that transcription of the viral 
genes from the reverse-genetics plasmids and subsequent 
replication of these genes by the influenza polymerase intro- 
duces a negligible number of new mutations. The second of 
these assumptions is supported by the fact that the mutation 
frequency in virus-p1 is close to that in RNA (fig. 3). The first 
of these assumptions is supported by the fact that stop 
codons are no more frequent in RNA than in virus-p1 
(fig. 3) — deleterious stop codons arising during transcription 
will be purged during viral growth, while those arising from 
reverse transcription and sequencing errors will not. 



At each site r, there are n codon codons, indexed by /' = 1, 
2, • ■ ■ icodon. Let wt(r) denote the WT codon at site r. Let 
N° NA be the total number of sequencing reads at site r in 
DNA, and let n™ A be the number of these reads that report 



codon / at site r, so that X! n ° NA 
N 



Njr . Similarly, let 



mutDNA , N RNA , and N™ wim! be the total number of reads 



at site r and let n 



mutDNA „RNA 



and n" 



be the total 



number of these reads that report codon / at site r in 
mutDNA, RNA, and mutvirus-pl, respectively. 

First consider the rate at which site r is erroneously read as 
some incorrect identity due to PCR or sequencing errors. Such 
errors are the only source of non-WT reads in the sequenc- 
ing of DNA. For all i ^ wt(r), define e r ;, as the rate at which 
site r is erroneously read as codon /' in DNA. Define 
e r ,wt(r) = 1 — S,/wt(r) € r,i t0 t> e the rate at which site r is 
correctly read as its WT identity of wt(r) in DNA. Then 



value. 



E 



/i° NA /N° NA l where E denotes the expectation 
Define ~t r = (e r , n , . . . ,e r>n(xjdon ) and 

(",T> ■ • ■ < N l) as vectors of the «rj and n™ A 



values, so the likelihood of observing n r 
is 



given e r and N 



DNA 



Pr n 



Mult^n 



DNA. N DNA 



(7) 



where Mult denotes the multinomial distribution. 

Next consider the rate at which site r is erroneously copied 
during reverse transcription. These reverse-transcription 
errors combine with the PCR/sequencing errors defined by 
~e r to create non-WT reads in RNA. For all / ^ wt(r), define 
p r j as the rate at which site r is miscopied to / during reverse 
transcription. Define p rm(r) = 1 - X^wt(r) p r j as the 
rate at which site r is correctly reverse transcribed. Ignore as 
negligibly rare the possibility that a site is subject to both 
a reverse transcription and sequencing/PCR error within 
the same clone (a reasonable assumption as both e r j 
and p ri are very small for i ^ wt(r)). Then 



i,wt(r) 



E 



„rna /n rna1 wnere SjMr) is the 
Kronecker delta (equal to one if / = wt(r) and zero other- 



+ Prj - 8 



wise). The likelihood of observing 
N RNA is 



given p r , e ri and 



Pr n 



Mult(V 



,e r + p r 



Sr 



(8) 

where S r = (<5 1jWt (r), . . . A^wtw) is a vector that is all 
zeros except for the element wt(r). 

Next consider the rate at which site r is mutated to some 
other codon in the plasmid mutant library. These mutations 
combine with the PCR/sequencing errors defined by e r to 
create non-WT reads in mutDNA. For all /' ^ wt(r), define fi r j 
as the rate at which site r is mutated to codon i in the mutant 
library. Define pt r , wt(r) = 1 - I] ¥wtW Mr,/ as the rate at 
which site r is not mutated. Ignore as negligibly rare the 
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possibility that a site is subject to both a mutation and a 
sequencing/PCR error within the same clone. Then 



wt(r) 



E 



n mutDNA /N mutDNA | Jhe | ike |j hood 



of observing n ™ tDNA given T r , and N™ tDNA is 

p r ^mutDNA | ^mutDNA^^^ 

= Mult^ r mutDNA ; N™ tDNA ,7^ +-t r -~t\ 



(9) 



Finally, consider the effect of the preferences of each site r 
for different amino acids, as denoted by the n rM values. 
Selection due to these preferences is manifested in 
"mutvirus." This selection acts on the mutations in the 
mutant library (fJL r J), although the actual counts in mutvirus 
are also affected by the sequencing/PCR errors (s rj ) and re- 
verse-transcription errors (p r j). Again ignore as negligibly rare 
the possibility that a site is subject to more than one of 
these sources of mutation and error within a single clone. 
Let A(i) denote the amino acid encoded by codon i. Let 
lz r be the > vector of 7T r , 0 values. Define the vector-valued 
function C as 



(10) 



so that this function returns a d codon -element vector con- 
structed from ix r . Because the selection in mutvirus due to 
the preferences 7i fi Mi) occurs after the mutagenesis fi r j 
but before the reverse-transcription errors p rJ and the 



s/ N n 



jtvirus~j 

where 



sequencing/PCR errors e r „ then E 

€r,i + Prj + Yr X 7I r ,A(i) x Mr./ ~ 2 X S; )VW ( r ) 

Yr = (t. KrAQWrA = ( C (n^ ■ jt^ (where • de- 
notes the dot product) is a normalization factor that accounts 
for the fact that changes in the frequency of one variant due 
to selection will influence the observed frequency of other 



variants. The likelihood of observing n™ 

jt, ~? r , ~Pr, ~n r , and N r mutvims is therefore 



given 



Pr n 



mutvirus i — r —r —r —r i .mutvirus \ 

| fi r ,e r ,p r ,jt r ,N r I 



Mult 



,„mutvirus, k/mutvirus ~T , , \ / p' 

n ; N ,e r + p r + — \ 2 b r 



(jtrj O jt r 



) 

(11) 



where o is the Hadamard (entry wise) product. 

Specify priors over e r , p r , [i,, and n r in the form of 
Dirichlet distributions (denoted here by Dir). For the priors 
over the mutation rates [i r , I choose Dirichlet-distribution 
parameters, such that the mean of the prior expectation for 



the mutation rate at each site r and codon /' is equal to the 
average value for all sites, estimated as the frequency in 
mutDNA minus the frequency in DNA (fig. 3), denoted by 
/x. So the prior is 

Pr(pt r ) = D\r(jt r ; n codon • • ct£ r ) (12) 

where a^ r is the n codon -element vector with elements 
a ii,rj = A + <5j,wt(r)(1 - "codon/"-) and <fy is the scalar con- 
centration parameter. 

For the priors over s r j and p ni , the Dirichlet-distribution 
parameters again represent the average value for all sites but 
now also depend on the number of nucleotide changes in the 
codon mutation because sequencing/PCR and reverse- 
transcription errors are far more likely to lead to single- 
nucleotide codon changes than multiple-nucleotide codon 
changes (fig. 3). Let A4(wt(r)j) be the number of nucleotide 
changes in the mutation from codon wt(r) to codon i. For 
example, A1(GCA,ACA) = 1 and .M(GCA,ATA) = 2. Let 
eT, e^, and be the average error rates for one-, two-, and 
three- nucleotide codon mutations, respectively — these are 
estimated as the frequencies in DNA. So the prior is 



PrfiT) = Dir("i^; n codon ■ a e ■ ~a^ r ) 



(13) 



where a e , r is the n codon -element vector with elements 
<*e,r,i = eM(wt(r),i) where 60 = 1- 9 x 67 - 27 x 6^-27 
xej, and where a e is the scalar concentration parameter. 

Similarly, let p7, P2. and pi be the average reverse- 
transcription error rates for one-, two-, and three-nucleotide 
codon mutations, respectively — these are estimated as the 
frequencies in RNA minus those in DNA. So the prior is 



Pr( p r ) = Dir(p^; n codo r, ■ a p ■ ~a^ r ) 



(14) 



where a 



pj is the n codon -element vector with elements 
a p ,r,i = P7W(wt(r),i) where A) = 1- 9xpT-27xpJ-27 
xpj, and where o p is the scalar concentration parameter. 

Specify a symmetric Dirichlet-distribution prior over it, 
(note that any other prior, such as one that favored WT, 
would implicitly favor certain identities based empirically 
on the WT sequence, and so would not be in the spirit of 
the parameter-free derivation of the 7t rfi values employed 
here). Specifically, use a prior of 



Dir^; a n ■ Tj 



(15) 



where 1 is the n aa -element vector that is all ones, and o n is the 
scalar concentration parameter. 

It is now possible to write expressions for the likelihoods 

and posterior probabilities. Let J\f r 



DNA mutDNA RNA 



n mutvirus >N D^ jN mutDNA jN RNA >N mutvirusj denote the M ^ 

of counts for site r. The likelihood of J\f r given values for 
the preferences and mutation/error rates is 
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Pr(j\f r | IhX, ~Pr,Jh) 



= Pr| n? NA | N™\t ) x Pr! n rna | N RNA , ^, Jr 



x Pr 



,„mutDNA i M mutDNA — > — > 



(16) 



where the likelihoods that compose equation (16) are denned 
by equations (7-11). The posterior probability of a specific 
value for the preferences and mutation/error rates is 

Pr(^, ~e^,~p r ,jt | Af r j = C f x Pr^A/",- 1 7? r , i^p^/^J x 

(17) 

Pr(i^) x Pr(p^) x Pr(/^) x Pr^) (18) 

where C r is a normalization constant that does not need to be 
explicitly calculated in the MCMC approach used here. The 
posterior over the preferences Jt r can be calculated by inte- 
grating over equation (17) to give 

Pr(^|Af r ) = j j j Pr( 

(19) 

where the integration is performed by MCMC. The posterior 
is summarized by its mean, 



flt r 



x Prl it, 



AC 1 1 < k < n 



(20) 



In practice, each replicate consists of four libraries (WT-1, 
WT-2, N334H-1, and N334H-2)— the posterior mean prefer- 
ences inferred for each library within a replicate are averaged 
to give the estimated preferences for that replicate. The pref- 
erences within each replicate are highly correlated regardless 
of whether mutvirus-p1 or mutvirus-p2 is used as the mut- 
virus data set (fig. 5A and B). This correlation between pas- 
sages is consistent with the interpretation of the preferences 
as the fraction of genetic backgrounds that tolerate a muta- 
tion (if it was a selection coefficient, there should be further 
enrichment upon further passage). The preferences averaged 
over both replicates serve as the "best" estimate and are dis- 
played in figure 5D. This figure was created using the 
WebLogo 3 program (Schneider and Stephens 1990; Crooks 
et al. 2004). 

Figure 5D also shows relative solvent accessibility (RSA) 
and secondary structure for residues present in chain C of 
NP crystal structure PDB 2IQH (Ye et al. 2006). The total 
accessible surface area (ASA) and the secondary structure 
for each residue in this monomer alone were calculated 
using DSSP (Kabsch and Sander 1983; Joosten et al. 2011). 
The RSAs are the total ASA divided by the maximum ASA 
defined in Tien et al. (2013). The secondary structure codes 
returned by DSSP were grouped into three classes: Helix 



(DSSP codes G, H, or I), strand (DSSP codes B or E), and 
loop (any other DSSP code). 

Phylogenetic Analyses 

A set of NP coding sequences was assembled for human 
influenza lineages descended from a close relative the 1918 
virus (H1N1 from 1918 to 1957, H2N2 from 1957 to 1968, 
H3N2 from 1968 to 2013, and seasonal H1N1 from 1977 to 
2008). All full-length NP sequences from the Influenza Virus 
Resource (Bao et al. 2008) were downloaded, and up to three 
unique sequences per year from each of the four lineages 
described above were retained. These sequences were aligned 
using EMBOSS needle (Rice et al. 2000). Outlier sequences 
that correspond to heavily lab-adapted strains, lab recombi- 
nants, misannotated sequences, or zoonotic transfers (e.g., a 
small number of human H3N2 strains are from zoonotic 
swine variant H3N2 rather than the main human H3N2 lin- 
eage) were removed. This was done by first removing known 
outliers in the influenza databases (Krasnitz et al. 2008) and 
then using an analysis with RAxML (Stamatakis 2006) and 
Path-O-Gen (http://tree.bio.ed.ac.uk/software/pathogen/, last 
accessed May 31, 2014) to remove remaining sequences that 
were extreme outliers from the molecular clock. The final 
alignment after removing outliers consisted of 274 unique 
NP sequences. 

Maximum-likelihood phylogenetic trees were constructed 
using codonPhyML (Gil et al. 2013). Two substitution models 
were used. The first was GY94 (Goldman and Yang 1994) 
using CF3x4 equilibrium frequencies (Pond et al. 2010), a 
single transition-transversion ratio optimized by maximum 
likelihood, and a synonymous-nonsynonymous ratio drawn 
from four discrete gamma-distributed categories with mean 
and shape parameter optimized by maximum likelihood 
(Yang et al. 2000). The second was KOSI07 + F (Kosiol et al. 
2007), optimizing the relative transversion-transition ratio by 
maximum likelihood, and letting the relative synonymous- 
nonsynonymous ratio again be drawn from four gamma- 
distributed categories with mean and shape parameter 
optimized by maximum likelihood. The trees produced by 
codonPhyML are unrooted. These trees were rooted using 
Path-O-Gen (http://tree.bio.ed.ac.uk/software/pathogen/, 
last accessed May 31, 2014) and visualized with FigTree 
(http://tree.bio.ed.ac.uk/software/figtree/, last accessed May 
31, 2014) to create the images in figure 7. The tree 
topologies are extremely similar for both models. 

The evolutionary models were compared by using them to 
optimize the branch lengths of the fixed tree topologies in 
figure 7 so as to maximize the likelihood using HYPHY (Pond 
et al. 2005) for sites 2-498 (site 1 was not included, because 
the N-terminal methionine is conserved and was not mu- 
tated in the plasmid mutant libraries). HYPHY was used to 
calculate all likelihoods (even for models that could be han- 
dled by codonPhyML) for consistency in case these programs 
differ slightly in numerical accuracy. The results are shown in 
tables 6 and 7. Regardless of which tree topology was used, the 
experimentally determined evolutionary models outper- 
formed all variants of GY94 and KOSI07+F. The 
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experimentally determined evolutionary models performed 
best when using the preferences determined from the com- 
bined data from both replicates and using equation (3) to 
compute the fixation probabilities. Using the data from just 
one replicate also outperforms GY94 and KOSI07 + F, al- 
though the likelihoods are slightly worse. In terms of the 
completeness with which mutations are sampled in the 
mutant viruses, replicate A is superior to replicate B as dis- 
cussed above — and the former replicate gives higher likeli- 
hoods. If the fixation probabilities are instead determined 
using the method of Halpern and Bruno (Halpern and 
Bruno 1998) as in equation (4), the experimentally deter- 
mined models still outperform GY94 and KOSI07 + F — but 
the likelihoods are substantially worse. To check that the ex- 
perimentally determined models really do utilize the site-spe- 
cific preferences information, the preferences were 
randomized among sites and likelihoods were computed. 
These randomized models perform vastly worse than any of 
the alternatives. 

The variants of GY94 and KOSI07 + F tested are listed in 
table 6. Various methods were used to estimate the non- 
synonymous-synonymous ratio (&)): A single co optimized 
by maximum likelihood; three discrete categories of co < 1, 
co = 1, and co > 1 with the proportions and the co values ^ 1 
estimated by maximum likelihood; co drawn from four 
gamma-distributed categories with mean and shape esti- 
mated by maximum likelihood; and a beta distribution (ten 
categories) plus an additional category of co > 1 with the 
shape parameters, co > 1 value, and proportion in the final 
category estimated by maximum likelihood. These models are 
referred to MO, M2a, M5, and M7 in the literature (Yang et al. 
2005). Another model optimized a different co for each 
branch. Another model optimized a single co but allowed 
the rates to be drawn from four gamma-distributed catego- 
ries. Parameters were counted as follows: All contained equi- 
librium frequency parameters that were empirically estimated 
from the sequences under analysis: There are nine such pa- 
rameters for GY94 using CF3x4 (Goldman and Yang 1994; 
Pond et al. 2010) and 60 such parameters for KOSI07 + F 
(Kosiol et al. 2007). In addition, all variants contain a transi- 
tion-transversion ratio optimized by likelihood. Finally, all 
variants contain one or more co parameters as described 
above. 

Supplementary Material 

Supplementary figures S1 and S2 and files S1 and S2 are 
available at Molecular Biology and Evolution online (http:// 
www.mbe.oxfordjournals.org/). 
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